Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs

نویسندگان

  • Norihiro Maeda
  • Takeya Kasukawa
  • Rieko Oyama
  • Julian Gough
  • Martin Frith
  • Pär G Engström
  • Boris Lenhard
  • Rajith N Aturaliya
  • Serge Batalov
  • Kirk W Beisel
  • Carol J Bult
  • Colin F Fletcher
  • Alistair R. R Forrest
  • Masaaki Furuno
  • David Hill
  • Masayoshi Itoh
  • Mutsumi Kanamori-Katayama
  • Shintaro Katayama
  • Masaru Katoh
  • Tsugumi Kawashima
  • John Quackenbush
  • Timothy Ravasi
  • Brian Z Ring
  • Kazuhiro Shibata
  • Koji Sugiura
  • Yoichi Takenaka
  • Rohan D Teasdale
  • Christine A Wells
  • Yunxia Zhu
  • Chikatoshi Kai
  • Jun Kawai
  • David A Hume
  • Piero Carninci
  • Yoshihide Hayashizaki
چکیده

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CAGE Basic/Analysis Databases: the CAGE resource for comprehensive promoter analysis

Cap-analysis gene expression (CAGE) Basic and Analysis Databases store an original resource produced by CAGE, which measures expression levels of transcription starting sites by sequencing large amounts of transcript 5' ends, termed CAGE tags. Millions of human and mouse high-quality CAGE tags derived from different conditions in >20 tissues consisting of >250 RNA samples are essential for iden...

متن کامل

SPA: A Probabilistic Algorithm for Spliced Alignment

Recent large-scale cDNA sequencing efforts show that elaborate patterns of splice variation are responsible for much of the proteome diversity in higher eukaryotes. To obtain an accurate account of the repertoire of splice variants, and to gain insight into the mechanisms of alternative splicing, it is essential that cDNAs are very accurately mapped to their respective genomes. Currently availa...

متن کامل

The Ensembl gene annotation system

The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct...

متن کامل

Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines

RIKEN's FANTOM project has revealed many previously unknown coding sequences, as well as an unexpected degree of variation in transcripts resulting from alternative promoter usage and splicing. Ever more transcripts that do not code for proteins have been identified by transcriptome studies, in general. Increasing evidence points to the important cellular roles of such non-coding RNAs (ncRNAs)....

متن کامل

P-107: The Effects of Cryotop Vitrification on Heat Shock Protein 72 Expression in Mouse 2-Cell Embryos by Nested Quantitative PCR

Background: The aim of the study was to compare the effects of two different concentrations of cryoprotectants by Cryotop vitrification on survival and Heat shock protein 72 (Hspa1a) expression of two-cell mouse embryos. Materials and Methods: Different cryoprotectants’ concentrations of the combination of dimethyl sulfoxide (DMSO) and ethylene glycol (EG) were used and compared with each other...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PLoS Genetics

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2006